This repository has been archived by the owner on Jan 20, 2024. It is now read-only.
Assignment probabilities based on segment sizes #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've been using this tool to create segments for email campaigns, but have run into an issue when using it to generate segments of much different sizes.
The current code is looping over parent campaign members in the order they come out of the underlying datastore and attempting to assign to each of the groups with equal probability. In practice, this means small groups fill up first (presumably with older contacts who have lower primary keys) and larger groups end up over-representing contacts who are assigned later (presumably newer contacts with larger primary keys). Additionally, this has a performance impact by continuing to attempt to assign members to the small segments even after they've filled up, resulting in many recursive calls to
assignMember()
.This patch assigns a probability to each segment based on its relative size and then makes assignments based on those probabilities. As a result, members are added to smaller segments at a slower rate than to the larger segments, providing a more even distribution of assignments relative to the initial ordering of the contacts and fewer recursive calls.